-
-
Notifications
You must be signed in to change notification settings - Fork 150
remove protobuf for oTel ingestion #1406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove protobuf for oTel ingestion #1406
Conversation
WalkthroughRemoves OTEL Protobuf ingestion support and unifies OTEL content handling into a single function that processes JSON and rejects Protobuf. Updates handlers to use the new signature, exposes setup_otel_stream publicly, makes LOG_SOURCE_KEY public, and exposes the validator module via lib.rs. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP Handler
participant OTEL Processor
participant Storage
Client->>HTTP Handler: OTEL ingest request (headers + body)
HTTP Handler->>OTEL Processor: process_otel_content(Content-Type, body, known fields)
alt Content-Type = JSON
OTEL Processor->>Storage: flatten_and_push_logs(parsed JSON)
OTEL Processor-->>HTTP Handler: Success/Result
else Content-Type = Protobuf
OTEL Processor-->>HTTP Handler: Error: Protobuf not supported
end
HTTP Handler-->>Client: Response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (3)
src/handlers/http/ingest.rs (3)
176-183
: Panic risk: avoid unwrap() on header value parsing
to_str().unwrap()
can panic on invalid header bytes. Propagate a well-formed error instead.- let log_source = LogSource::from(log_source.to_str().unwrap()); + let log_source = LogSource::from( + log_source + .to_str() + .map_err(|_| PostError::Invalid(anyhow::anyhow!("Invalid {} header", LOG_SOURCE_KEY)))? + );
238-279
: Content-Type equality is too strict; accept parameters and fix error messagesMany clients send
application/json; charset=utf-8
. The current equality check will reject these valid requests. Also, error messages should not advertise protobuf as acceptable given it’s explicitly unsupported now.Apply this diff to robustly parse Content-Type and align error messages with protobuf removal:
- match req - .headers() - .get("Content-Type") - .and_then(|h| h.to_str().ok()) - { - Some(content_type) => { - if content_type == CONTENT_TYPE_JSON { + match req.headers().get("Content-Type").and_then(|h| h.to_str().ok()) { + Some(raw_content_type) => { + // Accept parameters like `; charset=utf-8` + let content_type = raw_content_type.split(';').next().unwrap_or("").trim(); + if content_type == CONTENT_TYPE_JSON { flatten_and_push_logs( - serde_json::from_slice(&body)?, + serde_json::from_slice(&body)?, stream_name, log_source, &p_custom_fields, ) .await?; - } else if content_type == CONTENT_TYPE_PROTOBUF { + } else if content_type == CONTENT_TYPE_PROTOBUF { return Err(PostError::Invalid(anyhow::anyhow!( - "Protobuf ingestion is not supported in Parseable OSS" + "Protobuf ingestion is not supported in Parseable OSS" ))); } else { return Err(PostError::Invalid(anyhow::anyhow!( - "Unsupported Content-Type: {}. Expected application/json or application/x-protobuf", - content_type + "Unsupported Content-Type: {}. Expected application/json", + raw_content_type ))); } } None => { return Err(PostError::Invalid(anyhow::anyhow!( - "Missing Content-Type header. Expected application/json or application/x-protobuf" + "Missing Content-Type header. Expected application/json" ))); } }
238-279
: No payload size guard: potential DoS via large OTEL JSON bodiesThe prior 10MB guard appears to have been removed with the protobuf path. Add a conservative upper bound for JSON bodies to avoid memory spikes.
async fn process_otel_content( req: &HttpRequest, body: web::Bytes, stream_name: &str, log_source: &LogSource, ) -> Result<(), PostError> { let p_custom_fields = get_custom_fields_from_header(req); + // Guard against excessively large payloads (10 MB) + const MAX_OTEL_JSON_BYTES: usize = 10 * 1024 * 1024; + if body.len() > MAX_OTEL_JSON_BYTES { + return Err(PostError::Invalid(anyhow::anyhow!( + "Payload too large: {} bytes (limit: {} bytes)", + body.len(), + MAX_OTEL_JSON_BYTES + ))); + } + match req .headers() .get("Content-Type") .and_then(|h| h.to_str().ok()) {If a global limit/config already exists elsewhere, wire it here instead of a hardcoded constant.
🧹 Nitpick comments (3)
src/lib.rs (1)
51-51
: Confirm intent of expanding public API surface (pub mod validator)Making validator public is a breaking surface change. If external crates aren’t meant to depend on internal validation details, consider
pub(crate)
or re-exporting only the required items instead of the entire module.Would you like me to prepare a targeted re-export (e.g.,
pub use validator::{StreamNameValidationError, stream_name};
) to minimize the public surface?src/handlers/http/ingest.rs (2)
166-171
: Visibility: consider pub(crate) for setup_otel_stream unless needed externallyIf this helper is only used within this crate, prefer
pub(crate)
to avoid unnecessary public API expansion. If you intend SDKs or integrators to call this directly, keep it public.-pub async fn setup_otel_stream( +pub(crate) async fn setup_otel_stream(
192-235
: Minor: setup_otel_stream returns LogSourceEntry but it’s unused by callersConsider returning only
(String, LogSource)
or reusing the entry in the caller to avoid unused data. This reduces allocations and clarifies intent.-) -> Result<(String, LogSource, LogSourceEntry), PostError> { +) -> Result<(String, LogSource), PostError> { @@ - Ok((stream_name, log_source, log_source_entry)) + Ok((stream_name, log_source))…and remove the unused binding at the call sites.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (3)
src/handlers/http/ingest.rs
(7 hunks)src/handlers/mod.rs
(1 hunks)src/lib.rs
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/handlers/http/ingest.rs (1)
src/validator.rs (1)
stream_name
(33-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (5)
src/handlers/mod.rs (1)
28-28
: Making LOG_SOURCE_KEY public looks goodExposing LOG_SOURCE_KEY aligns with the new OTEL flow that validates/reads this header across modules. No issues spotted.
src/handlers/http/ingest.rs (4)
38-40
: Good: switching to known field lists per OTEL signalUsing OTEL_LOG_KNOWN_FIELD_LIST/OTEL_METRICS_KNOWN_FIELD_LIST/OTEL_TRACES_KNOWN_FIELD_LIST simplifies stream setup and avoids protobuf-specific flattening logic. This aligns well with the PR objectives.
288-295
: Good: OTEL logs path now flows through common setup and content processingThe refactor reduces duplication and centralizes validation. Assuming the content-type check is relaxed as suggested, this should be robust for typical agents.
Also applies to: 296-296
308-315
: Good: OTEL metrics path parity with logsMirrors the logs path and enforces log-source alignment via known fields. Looks consistent.
Also applies to: 316-316
328-335
: Good: OTEL traces path parity with logs/metricsSame comments as above. Centralizing content processing simplifies future maintenance.
Also applies to: 336-336
Summary by CodeRabbit
New Features
Refactor
Chores